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HIGHER RADIX MULTIPLIER WITH SIMPLIFIED 
PARTIAL PRODUCT GENERATOR 



Inventors: David Matula, Peter-Michael Seidel, and Lee McFearin 



FIELD OF THE INVENTION 

This invention relates in general to the field of methods for design of digital devices. 
More particularly, the invention relates to the design of a high precision multiplier for an 
arithmetic unit of a digital processor. 

5 BACKGROUND OF THE INVENTION 

The arithmetic unit is one of the most important components of any integrated electronic 
data processing system. A high precision multiplier is a fundamental part of the arithmetic unit 
as multiplication is one of the most frequently performed arithmetic operations. Multipliers are 
large and relatively slow blocks in most processors, and for this reason their implementation has 
10 a critical impact on cycle time and processor area. 

Without limiting the scope of the invention this background is provided on the 
terminology associated with multipliers for arithmetic units employing what is typically termed 
higher radix Booth recodings of the multiplier. 

For a bit-string a = a[p-l : 0]= (a[/7-l] ) ... ? a[o]) = (a p _ x ,...,a 0 )e {0,l} p we denote by 
15 ( a ) = ^^=o a \*]'2 l the binary number represented by a. 

A p'xp- multiplier is a circuit with p' inputs a = tf[/? f -l : 0], p inputs £> = a[p-l:0], 
and p'+p outputs c = c[p'+p-l:0], such that (a)-(6) = (c) holds. 

For binary multiplication, in general, binary p'xp- multipliers are implemented with a 
first step of generating the product in a carry-save representation and a second step of 
20 compressing the carry-save representation to a binary representation of the product. We focus on 
implementation of the first step. In the simplest form the product computation is based on the 
sum: 
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with the partial products: S { - (a) •&[/]• T . These partial products have to be generated and 

compressed to a carry-save representation. The generation of the partial products corresponding 
to (1) simply consists of logical AND-gates. Except for optimizing the logical and physical 
implementation of the partial product reduction, the main approach to decrease the delay and size 
5 of the partial product reduction is to decrease the number of partial products in (1) by 
representing one of the operands in a higher radix. 

For higher radix partial product generation, let a /?-digit string in radix /? denote the radix 
polynomial 

d p . x P p -' +d p _ 2 P"- 2 +... + d 0 gP\P,d] 

10 Here, J3 > 2 is the radix, D is the digit set with d i eD for 0 < i < p-l , and the radix system 
P[p, D] denotes the set of all radix polynomials with radix J3 and digits from Z). 

For the product A B of the jo-bit integer multiplicand A = (a) and the p-bit integer 



multiplicand B = (b), let the multiplier be represented by a //= 



p + 1 



digit polynomial in the 



k 

balanced minimally redundant (Booth digit) system Each term of 

15 the product 

A-B = y P ^A^d r 2 ki 
is a higher radix partial product of the form 

A-d r 2 ki = -(l) s ■ V ■ <JA\j e {l,3,. . .,2^ - 1}. 
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Radix 


Primitive 
Part. 
Prods 
OA) 


# 

Partial 
Products 


PPG 
Fanin 


Fanout 
Primitive 
PP's 


Total 
PPG 
Fanin 


2 


1 


64 


1 


64 


64 


4 


1 


33 


1 


33 


33 


8 


2 


22 


2 


22 


44 


16 


4 


17 


4 


17 


68 


32 


8 


13 


8 


13 


124 



Table 1 : Complexity of partial product generation for various Booth recodings (p=64). 

This allows each higher radix partial product to be created from a set of 2 k ~ 2 primitive 
partial products Q'A) by a conditional shift and/or complement. Five metrics are provided in 
Table 1 for comparing the consequences of employing higher radix Booth recodings on a 64-bit 
5 operand. The number of primitive partial products that must be computed and routed to each 
partial product generator (PPG) grows linearly with the base /? while the number of partial 
products that must be driven to each PPG decreases inversely with Ig /?. A measure of multiplier 
circuit routing complexity is the total PPG-fanin given by the number of primitive partial 
products that must be routed into each PPG summed over all PPGs. The necessity of routing 
10 each primitive partial product (J A) to each of the PPGs causes the total PPG-fanin to grow for 
/?>4. 

Radix-4 has a clear advantage in these metrics over the host radix-2. The reduction by 
one half in the number of partial products and total PPG-fanin is obtained simply by the facility 
of the PPG units to conditionally complement and/or perform a one bit shift. 

15 Moving to radix-8 further reduces the number of partial products by a third while adding 

the complexity and delay of a 2-1 add to precompute an additional primitive partial product (3,4). 
This tradeoff is more acceptable for higher precisions in terms of adder complexity, but the 
routing of the two primitive values to each PPG increases the total PPG fanin. 

Moving to radices 16 and 32 only marginally reduces the number of partial products 
20 while greatly increasing the partial product complexity both in number of primitive partial 
products and total fanin to the PPGs. 
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With this background we then note that prior art implementations of multipliers generally 
employ either radix-4 or radix-8, and it is helpful to focus on the features and disadvantages of 
these systems. Figure 1 A provides a block diagram of a prior art Booth radix-4 multiplier and 
Figure IB provides a block diagram of a prior art Booth radix-8 multiplier for comparing 
5 features. 

Implementation of a (p'xp) -bit multiplier involves the accumulation of p partial products 
of the //-bit multiplicand. Booth recoding of the multiplier to radix 4 with digits {-2,-1,0,1,2} 

> + f 



reduces the number of partial products to 



, where each partial product is obtained by 



2 

shifting and/or complementing the multiplicand for each non zero digit. Radix 4 recoding thus 
10 reduces the multiplier size by about 50% and provides more flexibility in reducing the cycle time 
of an implementation. 

Booth radix 8 multiplier recoding with digits {-4,-3,-2,-1,0,1,2,3,4} reduces the number 



of partial products to 



realizing a further 33% size reduction in the number of partial 



3 

products. However, radix 8 recoded multipliers have several inherent disadvantages over radix 4 
15 recoded multipliers. In particular: 

(i) Multiplier precomputation: Radix 8 recoding requires the area and delay 
of a carry propagate adder to precompute a 3x-multiplicand partial 
product. 

(ii) (ii) Multiplicand routing: Both the lx-multiplicand and 3x-multiplicand 
20 are high precision operands that must be routed to substantially all partial 

product generators (PPGs). 

(iii) (iii) Partial product generation: The partial product generator must select 
either the lx or 3x multiplicand as well as shift and/or complement the 
term for each non-zero digit. 
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Accordingly, a need has arisen for a multiplier method and design reducing the number 

> + l" 



of partial products to be accumulated below 



while avoiding the precomputation adder 



be reduced to 



-+2 



achieving over a 15% further reduction over Booth radix-4 for 



2 

area and delay of obtaining a 3x multiplicand. A further need has arisen to avoid the 
multiplicand routing and PPG selection complexity of sending two distinct multiples of the 
5 multiplicand to each PPG. 

SUMMARY OF THE INVENTION 

In accordance with the present invention a multiplier recoding and partial product 
generation method and multiplier system are provided where the number of partial products can 

2(p+l) 
5 

10 sufficiently large p where no precomputed multiplier of the multiplicand need be determined, 
and where each PPG receives only the input multiplicand for selective shift and/or 
complementation. 

We obtain these properties by a novel operand recoding scheme for the implementation 
of higher radix multiplication. In particular, we investigate the design of multipliers radix-32 

15 and radix-256. For these higher radix multipliers each of the higher radix digits is recoded 
employing a secondary radix. Specifically, each radix-32 digit is represented by two radix-7 
digits, and each radix-256 digit is represented by three radix- 1 1 digits. Hence, a /7-bit multiplier 
is represented by roughly 2p/5 resp. 3p/S terms. The novel feature of our secondary radix 
recoding is that all the non zero digit magnitudes are a power of two, which simplifies the 

20 implementation of the partial product generation. The partial products depending on multiples of 
the radices 7 or 1 1 can be separately accumulated, with multiplication by the radix a pre- or post- 
computation option. 

An important technical advantage of the present method is that the multiplication by the 
secondary radix can be accomplished within the structure of the PPR adder tree requiring the 
25 accumulation of only a couple of extra partial products (two for secondary radix-7) with that 
number independent of the overall multiplier size. A second technical advantage of the present 
invention is the simplified routing achieved by having only one version of the multiplicand 
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routed to each PPG. For a very large (p f x/?)-bit multiplier where /?' may be twice or more the 
size of /?, the need to send only one p 1 bit operand to each of a reduced number of PPGs is a clear 
advantage. A third feature of our preferred secondary radix-7 implementation is that half of the 
PPGs are standard Booth radix-4 PPGs and the other half are simplified versions of Booth radix 
5 8 PPGs accepting only the digits {- 4,-2-1,0,1,2,4} . 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 A presents a schematic diagram for a prior art multiplier design employing Booth 
radix-4 recodings and Figure IB presents a schematic diagram for a prior art multiplier design 
employing Booth radix-8 recodings. 

10 Figure 2 A presents a simplified schematic diagram of partial product generation for radix 

32 recoding with precomputation of 7x multipliers and Figure 2B presents a simplified schematic 
diagram of partial product generation for a preferred embodiment radix 32 recoding with 
postcomputation of 7x multipliers. 

Figure 3 is a detailed block diagram of a presently-preferred embodiment with primary 
15 radix 32, secondary radix 7, and with multiplication by 7 incorporated in a portion of the PPR 
tree. 

Figure 4 is a detailed block diagram of a further embodiment also using radix 32, with 
multiplication of the multiplicand by the secondary radix shown as a precomputation step. 

DETAILED DESCRIPTION OF THE INVENTION 

20 The multiplier unit of the present invention uses a novel method for multiplier recoding 

and for computing and accumulating the partial products. In particular an initial high binary 
power radix is chosen and the standard Booth digits for that radix are themselves recoded using a 
secondary radix. 

Secondary Radix Operand Recoding 

25 Consider a minimally redundant (Booth) radix polynomial representation B = ^^d^ 

for very high £, in particular 5 < k < 12 . The Booth digit ranges expand from -16 < d i < 16 for 
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k = 5 up to - 2048 < d, < 2048 for k= 12. To reduce the partial product complexity these large 
digit ranges are represented by a two to four digit number in a secondary radix y where the non- 
zero digits d e D* for the secondary radix system P[y,D*\ are exclusively restricted to signed 
binary powers d = (- if • 2" for s = 0, 1 ; n = 0,1,2, .... The possibility of such systems will first 
5 be illustrated by a simple example. 

Our primary radix is fi = 32 with Booth digit set D = {-16,-1 5,..., 16}. Our secondary 
radix is y = 7 with a digit set {- 4,-2,-1,0,1,2,4} where digit values are only signed binary powers 
or zero. Table 2 provides a lookup table illustrating the recoding as a three step process. The 
first column provides the 6-bit inputs which are taken to have weights -16,8,4,2,1,1 
10 corresponding to a leading 5-bit 2's complement integer followed by a low order bit carry in. 
Higher radix representation of the integer input value is first given in the table as a single 
"Booth" digit radix-32 and then as a two digit radix-7 value which is uniquely determined for 
-16<rf, <16 by 

d, =<*,, -7+4,0, ,</,.„ 6 {-4,-2,-1,0,1,2,4}. 

15 As a third step the radix-7 digits are finally given in Table 2 in encoded form using a sign 

and magnitude select bit encoding. Our 64x64 bit product AB using a secondary radix 
representation for B can be expressed as 

AB = {1A). 32' )+ (A)- Z;!oko32' ). (2) 

The right hand side of (2) has 26 partial products, achieving a reduction more than 
20 halfway between that of Booth radix-4 and radix-8. These 26 partial products are partitioned 
into two groups, 13 of which employ the primitive partial product (J A), and 13 of which employ 
(A) giving a total PPG fanin of only 26. Two options are possible with these simplified partial 
products noting that difil 1 = (-1/2" or 0 for all 0 < / < 12, 0 <j < 1. 

Pre-compute (7 A). The primitive partial product can be precomputed by a shift and add 
25 (7 A = SA-A) while the dy are obtained from a recoder or recoding table. 
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Post-compute (7 A). The higher order summation can utilize a 13:2 adder tree 
compressing X^o^'V 32 ') to a re dundant (e.g. carry save) sum z. Then the post computation 

can add 8z-z to the low order sum y = ^* =0 A(d i 0 32 l ) output from a second 13:2 adder tree. 
The value of 8z - z + y is completed by a 6:2 compressor and a 2-1 addition. 

.r 

t 
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binary plus carry input 


higher radix conversion 


secondary radix-7 encoding 


b[Si+4:5i-\] 


digit radix-32 


digits radix-7 








■si 


x 4 


x 2 




100000 


-16 


22 


1 


1 


0 


1 


0 


1 


0 


100001,100010 


-15 


21 


1 


1 


0 


1 


0 


0 


1 


100011,100100 


-14 


20 


1 


1 


0 


dc 


0 


0 


0 


100101,100110 


-13 


21 


1 


1 


0 


0 


0 


0 


1 


100111,101000 


-12 


22 


1 


1 


0 


0 


0 


1 


0 


101001,101010 


-11 


14 


1 


0 


1 j 1 


1 


0 


0 


101011,101100 


-10 


24 


1 


1 


0 


0 


1 


0 


0 


101101,101110 


-9 


12 


1 


0 


1 


1 


0 


1 


0 


101111,110000 


-8 


n 


1 


0 


1 


1 


0 


0 


1 


110001,110010 


-7 


10 


1 


0 


1 


dc 


0 


0 


0 


110011,110100 


-6 


n 


1 


0 


1 


0 


0 


1 o 


1 


110101,110110 


-5 


12 


1 


0 


1 


0 


0 


; 1 


0 


110111,111000 


-4 


04 


dc 


0 


o 


1 


1 


0 


0 


111001,111010 


-3 


14 


1 


0 




1 




0 


1 


' 0 


0 


111011,111100 


-2 


02 


dc 


0 


0 


1 


0 


1 


0 


111101,111110 


-1 


01 


dc 


0 


0 


1 


0 


0 


1 


111111,000000 


0 


00 


dc 


0 


0 


dc 


0 


0 


0 


000001,000010 


1 


01 


dc 


0 


0 


0 


0 


0 


1 


000011,000100 


2 


02 


dc 


0 


0 


0 


0 


1 


0 


000101,000110 


3 


14 


0 


0 


1 


1 


1 


0 


0 


000111,001000 


4 


04 


dc 


0 


0 


0 


1 


0 


0 


001001,001010 


5 


12 


0 


0 




1 


0 


1 


0 


001011,001100 


6 


11 


0 


0 




1 


0 


0 


1 


001101,001110 


7 


10 


0 


0 




dc 


0 


0 


0 


001111,010000 


8 


11 


0 


0 




0 


0 


0 


1 


010001,010010 

: 


9 


12 


0 


0 




0 


0 


1 


0 


010011,010100 


10 


24 


0 


1 


0 


1 


1 


0 


0 


010101,010110 


11 


14 


0 


0 


1 


0 


1 


0 


0 


010111,011000 


12 


22 


0 




0 


1 


0 


1 


0 


011001,011010 


13 


21 


0 




0 


1 


0 


0 


1 


011011,011100 


14 


20 


0 




0 


dc 


0 


0 


0 


011101,011110 


15 


21 


0 




0 


0 


0 


0 


1 


011111 


16 


22 


0 




0 


0 


0 


1 


0 



Table 2: Primary radix-32, secondary radix-7 operand digit recoding. 
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Note that the post-computation preferred embodiment option utilizes only two more 
partial products and one additional level of 3-to-2 adder delay to avoid the complexity of a 2-1 
adder to pre-compute (1A). If multiplier digit recoding is performed in the first cycle of a 
pipelined multiplier, the post-computation option allows the product to be fed back as the 
5 multiplicand of a dependent multiply operation entering on the second cycle. This effectively 
reduces pipeline stall by one cycle on dependent multiplications. 

The theory of secondary radix recodings is developed in the literature, e.g., Proc. IEEE 
Int. Symp. on Comp. Arith. (ArithlS), Binary Multiplication Radix-32 and Radix-256, pp.23-32, 
2001. 

1 0 Secondary Radix Multiplier Designs 

In this section we propose multiplier designs on the basis of the secondary radix recoding 
schemes from the previous section. The proposed recoding schemes share the following features, 
that will be utilized in the designs: There are very few digits to be considered in the secondary 
radix representations. All digits in the secondary radix system are powers of two. All weights in 
1 5 the secondary radix system can be computed by two or three term sums. 

If p" denotes the number of digits that are required in the secondary radix representation 
of a higher radix digit, then ["(/? + 1)/£~|- p" digits are required to represent the multiplier (b) . 
Thus, in comparison with binary the number of digits is reduced by roughly kl p" which is 5/2, 
8/3 or 3 in the cases we consider. Additionally these digits are simple multiples, and even the 

20 multiplication by the weights can be computed by simple sums. This is not very different from 
the properties of Booth recoding radix-8. The main new flexibility for the implementations is 
given by the following properties: Each digit depends on only one odd multiple which could be 
1 and is known at design time. Only some of the digits in the secondary radix have to be 
weighted by a 'hard 1 multiple. These 'hard' multiples are computed unconditionally, they do not 

25 depend on the value of the digits in the secondary radix. The low order digits do not have to deal 
with 'hard' multiples. 

Based on these properties we suggest two basic architectures for the design of partial 
product generation and reduction as illustrated in Figures 2 A and 2B: 
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Architecture I: Pre-Computation of Hard Multiples. Referring to Figure 2A, the 
multiplications by the weights of the secondary representation are computed on the multiplicand 



order digits (which do not have to deal with any hard multiples) are generated and can already be 
5 partially reduced. These are combined with the remaining partial products in a second partial 
product reduction step. 

Architecture II: Post-Computation of Hard Multiples. Referring to Figure 2B, after 
recoding the multiplier (b) into the digits of the secondary radix system, the multiples of the 

multiplicand (a) by these digits (note that these are only multiples by powers of two) are 

10 generated. The terms that we get from this selection are accumulated separately in groups that 
share the same weights of the corresponding digits in the secondary radix system. The carry- 
save representation of the sum of each of these groups is then multiplied by the corresponding 
weight (note that these multiples can be computed by simple sums). The results are accumulated 
to get the carry-save representation of the product in a final partial product reduction step. 

15 Radix-32 Architecture of the Multiplier 

The encoding scheme for the proposed implementations is based on a radix-32 signed 
digit representation of the multiplier: 



so that the multiplier is represented by p'=\{p + \)l 5~\ radix-32 digits d i e {-16 -15,..., 16}. 
20 Corresponding to Booth recoding a canonical choice for the digits di is computed from the binary 



As suggested in the previous section each radix-32 digit di can be represented by two digits in the 
secondary radix-7: 



(a) . In parallel the multiplier (b) is recoded and the partial products corresponding to the low 




d, = -16b[5i + 4]+ Sb[5i + 3]+ 4b[5i + 2]+ 
2b[5i + l]+b[5i]+b[5i-l] 
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where both d u and d it o are a power of two. The high order radix-7 digits d t can only have values 
from the set {-2,-1,0,1,2} and the low order radix-7 digits d it o can only have the values 
{-4,-2,-1,0,1,2,4}. We will discuss some options for the recoding of the secondary radix digits 
5 separately in the following section. 

Without considering the weight of 7 this gives us two groups of f(/? + l)/5] partial 

products, each of which can be generated very easily. For the group of partial products generated 
by the low order digits d it0 these are already the final values for the partial products. 

The group of partial products generated by the high order digits du additionally have to 
10 be multiplied by 7. There are two options where this multiplication could be computed: On one 
hand the multiplicand (a) can be multiplied by 7 before the partial product reduction which 
leads to Architecture I. Figure 2A depicts a block diagram corresponding to this implementation 
radix-32 using the pre-computation of the 7x(a) multiple. The multiplication of (a) by 7 is 
computed by the following sum: 

15 7.( a > = 8 .(a)-(a> 

= (a\p - 1 : OjOO) + (l 1, 4/7-l:0j) + 1 . 

On the other hand the group of partial products generated by the high order digits could be 
multiplied by 7 after these partial products already have been compressed to a carry-save 
representation which corresponds to Architecture II. Figure 2B depicts a block diagram 
20 corresponding to this implementation radix-32 using the post-computation of the 7x multiple. 
Also in this case the formula 7 • x = 8 • x - x is used to compute the 7x multiple, but this time it is 
not computed using (a) , but it is computed using the carry-save representation of the sum of the 

terms that have been generated by the high order digits. In this way the 4 partial products from 
the carry-save representations of the two groups are extended to 6 partial products, which are 
25 then reduced to the carry-save representation of the product in a final 6:2 reduction step. Note, 
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that for the implementation of Architecture II the input of the multiplicand (a) is required later 
than the input of the multiplier (b) . With the partitioning suggested in Figure 2B this makes a 

difference of a whole cycle in which the second operand is not needed. An operand that is fed 
back from a multiplier result only requires one cycle in the partial product generation and 
5 reduction for this proposed partitioning. 

Radix-32 Recoding Options 

The overall goal of recoding is to obtain the final operand encoding, which for the 
secondary radix representation from the previous section may be stated and solved as follows: 

Primary radix-32, Secondary radix- 7 Recoding: Given the 6-bit input b 4 b 3 b 2 b x b 0 b_ x , determine 
10 the 7-bit output s 1 x X4 x 1 s x x 4 x 2 x x as specified in Table 2, where -16 < d < 16 is given by 

d = (- 1)' 7 • (14jc 14 + 7x 7 ) + (- \) Sl • (4x 4 + 2x 2 + x, ) 

= -16Z> 4 + 86 3 + 4b 2 + 2b x + b Q + b_ x . 
A first solution is a direct table lookup. 

/. Direct Table Lookup Solution: Note that s 7 = Z> 4 , and use b 4 b 3 b 2 b x b 0 b_ x as input to a 6-bits-in 
1 5 6-bits-out lookup table to obtain x H x 1 s ] x 4 x 2 x x . 

Time: 6-bit lookup 

Table Size: 48 bytes per radix-32 digit, with 624 bytes for parallel 13 radix-32 digit 
operand recoding of a 64 bit operand. 

Note that negation of d is obtained by complementing the input b 4 b 3 b 2 b x b 0 b_ l9 which 

20 complements the output signs but not the output magnitude, as may be observed in the top to 
bottom symmetry of Table 2. This provides a solution halving the table size. 

//. Complementary Table Lookup Solution: 
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(i) Use bit b 4 to conditionally complement the string b^b^b^ . 

(ii) Use the conditionally complemented string as input to a 5-bits-in 6-bits-out lookup- 
table to obtain x lA x 7 s\ x 4 x 2 x { . 

(hi) Set s 7 =b 4 and s x =6 4 @s\ . 

Time: An XOR-gate plus 5-bit lookup, assuming the subsequent XOR-gate determining is 
computed off the critical path since a PPG can select the magnitude before it complements. 

Table Size: 24 bytes per radix-32 digit, with 312 bytes for a 64-bit operand recoding. 

The table may be compressed by 1/3 and decompressed at a cost of one logic level. 

///. Complementary Compressed Table Lookup Solution: 

(i) Use bit b 4 to conditionally complement the string Z? 3 Z? 2 Z? 1 6 0 6_ I . 

(ii) Use the conditionally complemented string as input to a 5-bits-in 4-bits-out lookup- 
table to obtain jc 7 s f , b\ b\. 

(iii) Determine the output bits as follows: 



Time: Two logic levels plus 5-bit lookup. 

Table Size: 16 bytes per radix 32 digit, with 208 bytes for a 64-bit operand recoding. 

It should be clear to one skilled in the art that other secondary radix systems can be 
employed according to this invention. A system with primary radix 256 and secondary radix 1 1 
is readily constructed. Alternatively to representing the signed radix-256 digits with the fixed 



s 7 =b 4 

x l4 = ^x 7 -(b 4 @b 2 ) 



s, = b 4 ®s\ 
x 4 =b\-b' 0 
x 2 =^b\-b\ 
x, =^b\-b\ 
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secondary radix- 1 1, one could also choose a mixed radix representation for the signed digits d r 

For example also in the mixed secondary radix system 55-1 1-1, where the three digits d i 2 , d iX 

and d i 0 have the weights 55, 1 1 and 1, all digits can be chosen to be either a power of two or 

zero. More details on the foundations for such choices are given in the paper [Proc. IEEE Int. 
5 Symp. on Comp. Arith. (Arithl5), Binary Multiplication Radix-32 and Radix-256, pp.23-32, 
2001]. 

Figure 3 is a block diagram of one circuit embodiment capable of performing the method 
of multiplication of the current invention. Referring to Figure 3, a circuit, indicated generally as 
30, is indicated to have received the 64-bit multiplier in the A-Latch 33 and the 64-bit 
10 multiplicand in th B-Latch 35. The A-Latch has an additional 0 bit at each end extending from 
a_ x to a^. For / = 0...12, a 6-tuple of bits {ci 5i _ x ,a sn a 5i+x ,a SM ,a 5M ,a SUA ) is extracted for 

recoding in the secondary radix to obtain the 2 radix 7 digits d Vl , d 0i according to the recoding 

Table 2. Thus bits (a_ x ,a 0 ,a x ,a 2 ,a 3 ,a 4 ) are sent to recoder 49 to obtain d lQ and also are sent 

to recoder 62 to obtain d 00 ; and so on with bits {a^.a^.a^.a^.a^.a^) sent to recoder 37 to 

15 obtain d xi2 and also sent to recoder 50 to obtain d 0 X2 . 

The recoded digit d x 0 denoting a value in the set {-2,-1,0,1,2} from Recoder 49 is sent 

to the PPG selector 75 which also receives the 64 bit multiplicand from the B-Latch 35. The 
PPG operates as a standard Booth 4 PPG generating the value d x 0 x (multiplicand) and sends it 

to the appropriate adder input of the left half of the multipliers adder tree. Similarly, the digits 
20 d x x , d x2 , . . . d XX2 are sent to corresponding PPGs (63-75) for selection of 13 partial products to be 

summed in the left half adder tree 91. The output of the 13:2 PPR tree 91 is in redundant binary 
form and is multiplied by 7 by sending 8 times the value (shift by 3) and minus 1 times the value 
(compliment) each as a redundant binary value into the 6:2 PPR tree 95. 

The recoded digit d 0 0 denoting a value in the set {-4,-2,-1,0,1,2,4} from Recoder 62 is 

25 sent to the PPG selector 88 which also receives the 64 bit multiplicand from the B-Latch 35. The 
PPG operates as a modified Booth 8 PPG (modified to exclude digits 3, -3) generating the value 
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d l0 x (multiplicand) and sends it to the appropriate adder input of the right half of the 

multiplier's adder tree. Similarly, the digits d 0l ,d 02 ,...d 0n are sent to corresponding PPGs 

(76-88) for selection of 13 partial products to be summed in the right half adder tree 93. The 
output of the 13:2 PPR tree 93 is in redundant binary form and is sent in redundant form to the 
5 6:2 PPR tree 95. The output of the 6:2 PPR tree 95 in redundant binary form is sent to carry 
propagate adder 97 to finish the partial product accumulation and provide the 128 bit product on 
output line 98. 

Figure 4 provides a block diagram of a further embodiment of our invention. The 
multiplier circuit of Figure 4 computes 7x as a precomputation step before selection of the partial 
10 products for insertion into the adder tree of the multiplier. The design is similar to Booth radix 8 
where the 3x computation precedes selection by the PPGs. The principle advantage is apparent 
in that only one version of the multiplicand is sent to each PPG so that the routing complexity is 
similar to that of a Booth radix 4 multiplier where here the size of the adder tree is reduced by 
over 15% compared to Booth radix 4. 

15 In conclusion the present invention provides a new methodology for multiplier recodings 

reducing the size of the multiplier's adder tree without requiring two or more hard multiples of 
the multiplicand to be sent to each PPG. 

Although the present invention has been described in detail with regards to a particular 
secondary radix system, it should be understood that various changes, substitutions, and 
20 alterations in the secondary radix recoding process can be made hereto without departing from 
the invention as defined by the appended claims. 
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ABSTRACT 

A circuit and methodology for higher radix multiplication with improved partial product 
generation. The invention relates to the design of a high precision multiplier for an arithmetic 
unit of a digital processor. 
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Figure 1(a) 
Block diagram of partial 
product generation for 
prior art Booth radix-4 
multiplier 



Figure 1(b) 
Block diagram of partial 
product generation for 
prior art Booth radix-8 
multiplier 
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Figure 2(a) 
Simplified block 
diagram of partial 
product generation for 
radix-32 recoding with 
precomputation of 7x 
multiplier 



Figure 2(b) 
Simplified block 
diagram of partial 
product generation for 
preferred embodiment 
radix-32 encoding with 
precomputation of 7x 
multiplier 
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Figure 3: Block diagram of the preferred embodiment of the invention for a multiplier bit 
width of p = 64. 
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Figure 4: Block diagram of the precomputation option illustrating a further embodiment of 
the invention for a multiplier bit width of p = 64. 
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